feat(devx): local dev setup for control plane and full end-to-end flow (MLI-6681) by lilyz-ai · Pull Request #823 · scaleapi/llm-engine

lilyz-ai · 2026-05-07T02:10:00Z

Summary

Adds a complete local development workflow for model-engine so developers can iterate on both control plane code and the full endpoint lifecycle without cloud credentials or prod images.

Control-plane-only mode (make dev-server):

Spins up Postgres + Redis via docker-compose
LOCAL=true activates fake queue/docker/k8s implementations (mirrors CIRCLECI=true)
Full gateway API available at :5000 with auth skipped — no k8s cluster needed

Full end-to-end mode (make dev-server-full + make dev-service-builder + make dev-k8s-cacher):

make kind-up + make kind-image creates a local kind cluster and loads model-engine:local into it
Service Builder picks up endpoint creation tasks from local Redis and creates real k8s Deployments in kind
K8s Cacher polls kind and writes endpoint status back to Redis
Echo server (model-engine:local) used as the inference container — no GPU required

Code fixes included:

service_builder/celery.py + celery_task_queue_gateway.py: onprem cloud provider now uses redis Celery backend instead of s3 — without this, the Service Builder writes results to Redis but the Gateway looks in S3, leaving endpoints stuck in PENDING
dependencies.py: LOCAL=true + cloud_provider=onprem falls through to real OnPremQueueEndpointResourceDelegate instead of the fake
env_vars.py: GIT_TAG defaults to "local" when LOCAL=true so k8s templates reference the correct model-engine:local image

New files:

docker-compose.local.yml — Postgres 15 + Redis 7 with healthchecks and persistent volume
service_configs/service_config_local.yaml — HMI config for local services
model_engine_server/core/configs/local-full.yaml — onprem infra config for kind
Makefile — all dev targets in one place

Test plan

make dev-up && make dev-migrate && make dev-server — gateway starts, GET /v1/model-endpoints returns 200
make kind-up && make kind-image — kind cluster created, model-engine:local loaded
make dev-server-full + make dev-service-builder + make dev-k8s-cacher — all three processes start cleanly
POST a sync CPU endpoint with the echo server image → pod appears in kubectl --context kind-llm-engine get pods -n model-engine and endpoint transitions to READY
Existing unit tests pass: make test

Closes MLI-6681

🤖 Generated with Claude Code

Greptile Summary

Adds make dev-server (control-plane only, fake k8s/queue) and make dev-server-full / make dev-service-builder / make dev-k8s-cacher (full kind-based end-to-end) workflows; backing services are Postgres 15 + Redis 7 via a new docker-compose.local.yml.
Fixes a backend/broker mismatch for onprem: both celery_task_queue_gateway.py and service_builder/celery.py now correctly use redis as the Celery result backend for cloud_provider == \"onprem\", and dependencies.py routes LOCAL=true to fake queue delegates for non-onprem configs while still using the real OnPremQueueEndpointResourceDelegate for the full local flow.
OTLPMetricExporter is imported inside the shared SDK availability guard in correlation.py; this silently expands the OTel requirement to include opentelemetry-exporter-otlp-proto-grpc, which is only listed in vllm-specific requirements and not in the main requirements.txt.

Confidence Score: 4/5

Safe to merge for the intended local-dev purpose; one P1 in correlation.py could silently disable tracing in non-standard environments

Core bug fixes (celery backend protocol, dependency wiring) are correct and consistent. The P1 in correlation.py is isolated to environments that have opentelemetry-sdk without the OTLP exporter, which is non-standard given the existing requirements layout, so real-world impact is low but the architectural concern is valid.

model-engine/model_engine_server/common/startup_tracing/correlation.py — the OTLPMetricExporter guard import

Important Files Changed

Filename	Overview
model-engine/model_engine_server/common/startup_tracing/correlation.py	Adds OTLPMetricExporter import to the shared SDK availability guard, silently tightening OTel requirements beyond what main requirements.txt provides
model-engine/model_engine_server/infra/gateways/celery_task_queue_gateway.py	Correctly adds onprem to the redis backend_protocol branch, consistent with the service_builder change and fixing the S3/Redis backend mismatch
model-engine/model_engine_server/service_builder/celery.py	Correctly extends backend_protocol to use redis for onprem, matching the gateway change and resolving the stuck-in-PENDING bug
model-engine/model_engine_server/api/dependencies.py	LOCAL+non-onprem correctly routes to FakeQueueDelegate; LOCAL flag added to Redis task queue branch so control-plane-only dev mode works without onprem config
model-engine/model_engine_server/common/env_vars.py	GIT_TAG defaults to 'local' when LOCAL=true and validation check correctly skips the error; prevents startup failure in local dev without GIT_TAG set
model-engine/Makefile	Adds all dev targets with both LOCAL_ENV and FULL_LOCAL_ENV configurations; ML_INFRA_SERVICES_CONFIG_PATH is now pinned
model-engine/docker-compose.local.yml	Clean Postgres 15 + Redis 7 compose file with healthchecks; Postgres uses persistent volume, Redis is ephemeral (appropriate for local dev)
model-engine/model_engine_server/core/configs/local-full.yaml	onprem config for kind with celery_broker_type_redis: true; correctly enables Redis broker+backend path for the full local flow
model-engine/service_configs/service_config_local.yaml	Uses cache_redis_onprem_url (correct field) to avoid cloud-provider assertions; previously-noted issue with cache_redis_aws_url has been addressed

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Control-plane-only ["Control-plane-only (make dev-server)"]
        A[LOCAL=true\ncloud_provider=aws] --> B[FakeQueueDelegate]
        A --> C[Redis TaskQueueGateway\nlocalhost:6379]
        A --> D[FakeDockerRepository]
    end

    subgraph Full["Full end-to-end (make dev-server-full + builder + cacher)"]
        E[LOCAL=true\ncloud_provider=onprem] --> F[OnPremQueueDelegate]
        E --> G[Redis TaskQueueGateway\nlocalhost:6379]
        G -->|Celery task| H[Service Builder\nredis broker + redis backend]
        H -->|k8s Deployment| I[kind cluster\nmodel-engine:local]
        I -->|status| J[K8s Cacher\nwrites to Redis]
        J --> G
    end

    subgraph Infra
        K[(Postgres\nlocalhost:5432)]
        L[(Redis\nlocalhost:6379)]
    end

    C --- L
    G --- L
    A --- K
    E --- K

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
model-engine/model_engine_server/common/startup_tracing/correlation.py:15-17
**OTLP exporter import tightens OTel availability requirement silently**

`OTLPMetricExporter` is placed in the same `try/except ImportError` block as the core SDK availability check. This means any environment that has `opentelemetry-api` + `opentelemetry-sdk` installed but NOT `opentelemetry-exporter-otlp-proto-grpc` will now get `OTEL_AVAILABLE = False` and all trace correlation will silently be skipped. The exporter is only listed in vllm-specific requirements (`inference/vllm/requirements.txt`), not in the main `requirements.txt`, making this a fragile dependency for a shared utility. The import should be in its own nested `try/except` or removed entirely if `OTLPMetricExporter` isn't actually instantiated in this file.

_{Reviews (10): Last reviewed commit: "Merge branch 'main' into lilyz-ai/mli-66..." | Re-trigger Greptile}

Greptile also left 1 inline comment on this PR.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds a one-command local development workflow for the model engine control plane so developers can iterate on gateway/service-builder code without building prod images or touching live infra. - docker-compose.local.yml: spins up Postgres 15 + Redis 7 - service_configs/service_config_local.yaml: HMI config for local services - Makefile: dev-up / dev-migrate / dev-server / dev-down / test targets - LOCAL=true env var now activates fake queue/docker implementations (parallel to existing CIRCLECI=true path) and skips GIT_TAG requirement - README: new "Control Plane Local Setup" section with full walkthrough Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…G_PATH - service_config_local.yaml: switch from cache_redis_aws_url to cache_redis_onprem_url so the Redis URL is resolved before the cloud_provider assertion fires — fixes startup failure for non-AWS configs - Makefile: pin ML_INFRA_SERVICES_CONFIG_PATH to default.yaml so local dev is not affected by a developer's ambient infra config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- README: add ML_INFRA_SERVICES_CONFIG_PATH to the manual env-var snippet so developers with non-AWS ambient configs don't accidentally hit the cloud_provider assertion - docker-compose.local.yml: mount a named volume for Postgres so the database survives dev-down/dev-up cycles Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replaces the manual until-loops in dev-up with `docker compose up --wait`, which blocks until healthchecks pass and exits non-zero if they fail — eliminating the infinite-spin on container crash. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lilyz-ai · 2026-05-07T02:35:15Z

@greptile review

lilyz-ai · 2026-05-07T02:35:21Z

/greptile

Extends the local dev setup so the complete control plane → Service Builder → k8s inference pod flow can be tested locally without cloud credentials. Changes: - local-full.yaml: new onprem infra config pointing to localhost Redis/kind - dependencies.py: LOCAL=true + cloud_provider=onprem falls through to real Redis queue delegate instead of the fake (enabling full k8s flow) - service_builder/celery.py: fix onprem to use redis backend not s3 - env_vars.py: default GIT_TAG to "local" when LOCAL=true so k8s templates reference the correct model-engine:local image loaded into kind - Makefile: kind-up/kind-down/kind-image targets + dev-server-full, dev-service-builder, dev-k8s-cacher targets using FULL_LOCAL_ENV - README: full end-to-end setup section with step-by-step instructions, example endpoint creation, and flow table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The gateway's module-level backend_protocol had the same aws/gcp/azure mapping as service_builder/celery.py. Without this fix, the Service Builder writes task results to Redis but the Gateway looks in S3, leaving endpoints stuck in PENDING under the kind-based full local flow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lilyz-ai · 2026-05-07T03:17:18Z

@greptile review

lilyz-ai · 2026-05-07T03:17:23Z

/greptile

The exporter package was imported unconditionally under the OTEL_AVAILABLE flag which only checked the base SDK, not the exporter. Include it in the try block so OTEL_AVAILABLE stays False when the exporter is absent, fixing the ImportError that caused run_unit_tests_server to fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…trol-plane-local-devx

…chema gateway - Reformat correlation.py and celery.py to satisfy black - Move noqa comment to the from...import( line so ruff F401 is suppressed correctly - Pass schema_generator=GenerateJsonSchema() (new required kwarg) to get_definitions() and get_openapi_path() in live_model_endpoints_schema_gateway, creating a fresh instance per route since pydantic rejects reuse Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…oes not have this param The param was added to fix a local test failure (FastAPI 0.110.0 requires it) but FastAPI 0.135.1 (pinned in requirements.txt, used by CI) does not accept it, causing mypy call-arg errors. Revert to the original signature. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…xample Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

greptile-apps · 2026-05-13T00:15:37Z

+    from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import (  # noqa: F401
+        OTLPMetricExporter,
+    )


OTLP exporter import tightens OTel availability requirement silently

OTLPMetricExporter is placed in the same try/except ImportError block as the core SDK availability check. This means any environment that has opentelemetry-api + opentelemetry-sdk installed but NOT opentelemetry-exporter-otlp-proto-grpc will now get OTEL_AVAILABLE = False and all trace correlation will silently be skipped. The exporter is only listed in vllm-specific requirements (inference/vllm/requirements.txt), not in the main requirements.txt, making this a fragile dependency for a shared utility. The import should be in its own nested try/except or removed entirely if OTLPMetricExporter isn't actually instantiated in this file.

Prompt To Fix With AI

This is a comment left during a code review. Path: model-engine/model_engine_server/common/startup_tracing/correlation.py Line: 15-17 Comment: **OTLP exporter import tightens OTel availability requirement silently** `OTLPMetricExporter` is placed in the same `try/except ImportError` block as the core SDK availability check. This means any environment that has `opentelemetry-api` + `opentelemetry-sdk` installed but NOT `opentelemetry-exporter-otlp-proto-grpc` will now get `OTEL_AVAILABLE = False` and all trace correlation will silently be skipped. The exporter is only listed in vllm-specific requirements (`inference/vllm/requirements.txt`), not in the main `requirements.txt`, making this a fragile dependency for a shared utility. The import should be in its own nested `try/except` or removed entirely if `OTLPMetricExporter` isn't actually instantiated in this file. How can I resolve this? If you propose a fix, please make it concise.

lilyz-ai and others added 2 commits May 7, 2026 01:59

chore: bump model-engine image tag to latest main (2e9d007)

a9453f5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

greptile-apps Bot reviewed May 7, 2026

View reviewed changes

Comment thread model-engine/service_configs/service_config_local.yaml Outdated

Comment thread model-engine/Makefile Outdated

Comment thread model-engine/Makefile Outdated

greptile-apps Bot reviewed May 7, 2026

View reviewed changes

Comment thread model-engine/README.md

lilyz-ai and others added 2 commits May 7, 2026 02:32

lilyz-ai and others added 2 commits May 7, 2026 03:07

lilyz-ai changed the title ~~feat(devx): local control plane dev setup (MLI-6681)~~ feat(devx): local dev setup for control plane and full end-to-end flow (MLI-6681) May 7, 2026

lilyz-ai and others added 6 commits May 7, 2026 03:30

Merge remote-tracking branch 'origin/main' into lilyz-ai/mli-6681-con…

63c513f

…trol-plane-local-devx

docs(devx): replace curl endpoint example with launch-python-client e…

dd8e8c0

…xample Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'main' into lilyz-ai/mli-6681-control-plane-local-devx

f781623

greptile-apps Bot reviewed May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(devx): local dev setup for control plane and full end-to-end flow (MLI-6681)#823

feat(devx): local dev setup for control plane and full end-to-end flow (MLI-6681)#823
lilyz-ai wants to merge 13 commits into
mainfrom
lilyz-ai/mli-6681-control-plane-local-devx

lilyz-ai commented May 7, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lilyz-ai commented May 7, 2026

Uh oh!

lilyz-ai commented May 7, 2026

Uh oh!

lilyz-ai commented May 7, 2026

Uh oh!

lilyz-ai commented May 7, 2026

Uh oh!

greptile-apps Bot May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lilyz-ai commented May 7, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lilyz-ai commented May 7, 2026

Uh oh!

lilyz-ai commented May 7, 2026

Uh oh!

lilyz-ai commented May 7, 2026

Uh oh!

lilyz-ai commented May 7, 2026

Uh oh!

greptile-apps Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lilyz-ai commented May 7, 2026 •

edited by greptile-apps Bot

Loading